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1.  Introduction 

In  the  last  few  years  there  has  been  a  sizable  research 
effort  in  the  area  of  data  models.  This  is  due  in  part  to  the 
importance  of  the  subject  conceptually,  philcscphicall y  and  in 
practice.  Before  cne  can  attempt  to  understand,  design, 
implement  or  simply  use  a  Data  Ease  Management  System  (DBMS),  one 
needs  a  data  mcdel.  In  addition,  not  all  people  think  alike,  so 
that  different  intellectual  tools  may  appeal  to  different  people. 
Hence,  different  data  models  may  be  needed. 

Many  data  models  have  already  been  proposed,  each  having  its 
own  concepts  and  terminology.  In  studying  the  various  models  it 
becomes  apparent  that  they  have  similarities  and  differences 
which  are  not  trivial  to  analy2e.  This  paper  attempts  to  study  a 
collection  of  data  models  using  a  common  framework. 

We  are  not  proposing  a  meta-model  capable  of  capturing  all 
the  features  of  every  model.  Nor  are  we  promoting  any  particular 
model  as  being  the  best  in  any  way.  Cur  goal  is  to  point  cut  the 
ccncepts  and  features  of  each  model  and  hew  they  relate  to  other 
models.  In  this  way  we  hope  to  aid  the  prospective  data  model 
builder  to  place  his  own  model  in  perspective.  But  most 
important  of  all,  we  hope  to  help  the  DBMS  user  understand  what 
is  available  sc  that  he  may  choose  his  own  conceptual  tools 
according  to  his  tastes  and  needs. 
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One  important  aspect  of  the  classification  deals  with  data 
iBcdel  structure.  First,  we  are  interested  in  whether  a  model  has 
an  underlying  graph.  The  graph,  while  helping  to  visualize 
overall  infomation  organization,  introduces  the  problem  of  our 
understanding  the  meaning  of  a  node  or  arc.  For  example,  a  node 
may  represent  a  set,  a  concept,  an  event,  or  a  binary  relation, 
while  an  arc  may  stand  for  an  attribute,  a  function,  a  mapping,  a 
verb,  or  a  structural  link.  le  summarize  the  graph  theoretical 
terminology  in  Table  I. 

Second,  the  data  models  without  graph  structure  tend  to  be 
set  oriented.  In  these  models  we  are  interested  in  the  structure 
of  these  sets,  ie.,  whether  they  are  "nested”  or  "flat". 

The  latheiatical  foundations  of  the  data  models  center  on 
relations.  Basically,  the  models  can  be  classified  by  their  use 
of  binary  relations,  n-ary  relations,  or  a  combination  cf  both. 
A  fundamental  construct  interrelates  the  various  mathematical 


f crmu latic 

cs,  namely,  that 

a  binary  relation 

f  rom  a 

se  t 

A  to  a 

set  E  can 

te  viewed  either  as 

a  "mapping" 

of  elements 

of 

A  to 

these  of 

E,  or  as  a  subset 

(of  ordered 

pairs) 

of  t  he 

Cartesian 

prcduct  cf 

A  and  E, 

The  structures  of  the  models  have  implications  on  the  ways 
in  which  data  can  be  accessed.  In  the  section  on  logical  access 
we  distinguish  four  types  of  access  and  we  consider  how  the  data 
model  structures  support  these  access  types. 


Next,  we  study  the  basic  concepts  used  by  the  data  models, 
and  attempt  to  establish  the  similarities  and  differences  amonq 
both  the  concepts  and  data  model  terminology. 

Lastly,  we  deal  with  tbe  semantics  of  the  data  models.  By 
••semantics"  we  mean  the  extent  to  which  the  information 
structures  of  a  data  model  convey  the  meaning  of  the  data 
ccr.tained  in  the  data  base. 

liili  Cat  a  ijcdels  un^r  Stud  y 

In  all,  we  discuss  twenty-three  data  models.  We  concentrate 
on  and  summarire  the  features  of  fifteen  of  these  and  give 
references  to  descriptions  of  the  others.  The  fifteen  summarized 
are: 

1.  The  Eelaticnal  Data  hcdel  [Codd,  1970,1971] 

2.  The  EETG/CCDASYL  Model  [CODitSYL,  1971] 

3.  The  Hierarchical  Model  [IMS, System  20CC] 

h.  The  DIAM  II  Model  [Senko,  1974] 

5.  The  Infological  Model  [Lacgefors,  1974;  Sundgren,  1975] 

I 

6.  The  Binary  Logical  Association  Model  [Eracchi, 
et.  al.,  1976] 

7.  The  Fntity-Eelaticnship  Mcdel  [Chen,  1976] 

8.  Data  Semantics  [Atrial,  1974] 

9.  The  Functional  Data  Mcdel  [ Ni jssen,  1975; 

Kerschterg  &  Pacheco,  1975] 

10.  The  Semantic  Network  Model  [ Roussopoulos  6  Hylopoulos, 


1  975  ] 
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11.  The  Extended  Set  Theory  Model  [Childe,  1968,  1974] 

12.  Information  Hanagefrent  Concepts  [  Durchholz  and 
Bichter,  1974,  1976] 

13.  The  Information  Algebra  Model  [Kobayashi,  1975,  and  CODASYL,  1962] 

14.  The  Data  Space  Model  [Bachman,  1973] 

15.  The  Lindgreen  Model  [ lindgreen ,  1 974  ] 

The  order  of  presentation  does  not  imply  a  particular  model 
preference  on  cur  part.  »e  do  feel,  however,  that  the  first  four 
models  are  perhaps  the  best  known  to  readers,  and  we  will 
therefore  present  them  first. 

For  each  of  the  above  models,  we  present  a  brief 
description.  The  interested  reader  should  refer  to  the 
literature  for  more  details. 

2 . 1  T^g  a  b icjja^  Dat  a  Model 

The  relational  data  model  is  a  formal  model  which  represents 
data  item  relationships  as  n-ary  relations.  Consider  a  set  of 
dcjapng  S1,S2,,..,Sn  (not  necessarily  distinct).  A  relation,  E, 
on  these  n  sets  is  a  set  of  n-tuples  or  simply  tuples  such  that 
each  tuple  has  its  first  element  from  SI,  its  second  from  S2, 
etc.  The  relation  R  may  also  be  considered  a  subset  of  the 
Cartesian  product  SI xS2x. . . xSn.  The  set  Si  is  referred  to  as  the 
i-th  dciain  of  B,  and  B  is  said  to  be  of  degree  n. 

When  a  relation  P  is  represented  as  a  table,  the  columns  of 
the  table,  called  the  attributes,  correspond  to  the  domains  of 
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the  relation.  Ihe  attributes  are  usually  naioed  according  to 
their  domain,  however,  they  may  have  role  names  to  distinguish 
aucng  attributes  with  the  same  underlying  domain,  thereby 
maintaining  the  condition  that  attribute  names  within  relations 
be  unique. 

Each  row  of  the  table  corresponding  to  relation  H  represents 
a  tuple  of  P.  The  rows  are  distinct  because  P  is  a  set  of 
distinct  elements.  The  ordering  of  rows  is  immaterial,  as  is  the 
ordering  cf  columns,  when  they  are  referred  to  by  name  rather 
than  their  relative  table  positions. 

key,  K ,  of  a  relation,  E,  is  a  subset  of  attributes  of  P 
with  the  following  time-independent  properties  [Codd,  1971]: 

identification.  The  value  of  K  uniquely 
identifies  each  tuple  of  F, 

2.  Non -redundancy.  No  attribute  in  K  can  be  discarded 
without  violating  property  FI. 

"P  relation  P  may  have  several  candidate  keys  and  one  of 
these  is  chcser.  as  the  ^timarj  kej. 

In  the  relational  data  model,  data  access  may  be  performed 
either  by  the  relational  algebra  or  the  relational  calculus 
[Codd,  1S71b].  The  relational  algebra  consists  of  a  set  of 
relational  operators  which  take  relations  as  arguments  and  yield 
another  relation.  In  the  relational  calculus  the  definition  of 
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the  desired  resultant  relation  is  given  in  terms  of  a  predicate 
formulated  in  the  first-order  Predicate  Calculus. 

high-level  ncn- proced u ra 1  languages  such  as  SQUARE  [ Eoyce  et 
al.,  1975]  and  SEQUEL  [Chamberlin  and  Boyce,  1974]  have  teen 
developed  for  the  relational  model. 

2.2  The  DETG/CCD^YL  Model 

In  the  CCCASYL  DETG  proposal  there  is  an  inherent  data  model 
[Ccdasyl  EETG,  1971].  We  will  try  to  outline  this  model.  We 
will  not  elatorate  on  certain  details  and  peculiarities  of  the 
CETG  proposal,  tut  will  ccncertrate  mainly  on  what  we  con  side  r  to 
be  the  basic  issues  and  concepts. 

Ihe  two  basic  concepts  in  the  DETG  data  model  are  the  record 
t_y_pe  and  DETG  set  (sometimes  referred  to  as  coset  to  avoid 
ccnfusion  with  the  accepted  mathematical  meaning  of  the  word 
"set").  The  record  type  is  a  generic  concept  representing  a 
ccllection  of  record  occurrences.  Each  record  type  consists  of  a 
number  of  data  items.  Data  items  have  data  item  vd^lues  in 
individual  record  type  occurrences.  For  instance,  the  record 
type  EMPLOYEE  can  consist  of  the  data  items  NAME,  ADDRESS,  SKILL, 
S^^LARY.  Pi  particular  record  occurrence  can  have  the  data  item 
values  (Smith  Jchn,  190  St. George,  Welder,  15,000).  /According  to 
the  CETG  proposal,  record  types  can  also  have  aggregates,  vectors 
and  repeating  groups.  We  consider  these  notions  inessential. 
They  are  options  of  the  proposed  DDL  rather  than  basic  concepts 
of  the  data  model.  It  should  be  pointed  out  that  record  types 
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are  logical  and  are  not  tied  to  any  file  or  ether  particular 
iaplercentation. 

EETG  sets  serve  as  connections  between  the  record  types.  A 
DETG  set  connects  an  owner  record  type  with  a  gember  record  type. 
The  DETG  set  represents  a  set  of  relationships  between  record 
occurrences  of  the  owner  record  type  and  the  member  record  type. 
Two  restrictions  are  imposed  on  these  relationships.  First,  any 
record  occurrence  of  the  member  record  type  is  related  to  at  most 
one  record  occurrence  of  the  owner  record  type.  In  this  way,  an 
owner  record  occurrence  "owns”  a  collection  of  member  record 
occurrences.  Second,  the  owner  record  type  and  the  member  record 
type  of  a  CETG  set  cannot  be  the  same.  This  last  restriction  is 
not  particularly  crucial  for  the  DETG  model.  However,  it  is 
rather  important  for  the  proposed  COBOL  DKL.  /According  to  the 
DETG  proposal  a  DBTG  set  can  have  more  than  one  membei  record 
type.  However,  we  consider  this  added  feature  inessential  for 
tha  data  model. 


^  structure  diagram  is 

DETG  model  schema.  Each  node  cor 
directed  arc  corresponds  to  a  DBT 
is  frem  the  owner  record  type  to 
are  labelled  with  the  name  of  t 
that  the  arc  implies  a  function  f 
meirber  record  type  to  record 
type.  According  to  the  second  re 
same  source  and  destination. 


a  directed  graph  repress 
responds  tc  a  record  type 
G  set.  The  direction  of 
the  member  record  type, 
he  corresponding  DETG  set 
rom  record  occurrences 
occurrences  of  the  owner 
striction  no  arc  can  ha 


nting  a 
.  Each 
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2.3  The  Hierarchical  Model 

Ihe  hierarchical  icodel  is  not  usually  proposed  as  a 
ccnceptual  data  nicdel,  but  it  is  inherent  in  the  facilities  of  a 
number  of  EEKS's,  e.g.,  IMS  [lEM,  1975]  and  SYSTEM  2000  [MPI]. 
We  will  try  to  extract  the  basic  concepts  which  are  essential  for 
the  understanding  of  the  structure  and  the  data  access  features 
in  these  systeits, 

Ihe  two  basic  concepts  of  the  hierarchical  data  model  are 
the  record  t^ije  and  the  parent-child  relaticnship.  The  record 
type  is  sometimes  referred  to  as  a  segment  t ype  with  segment 
occurrences,  etc.  However,  the  ideas  cf  record  type,  as 
previously  defined  in  the  DETG  model,  and  segment  types  as  used 
in  hierarchical  systems  are  essentially  identical.  A  parent- 
child  relationship  connects  two  record  types  and  represents  a  set 
of  relationships  between  record  occurrences  of  the  two  record 
types.  Ats  in  the  previous  case  of  the  DBTG  model,  we  distinguish 
between  the  rcles  cf  the  record  types  as  related  within  a  parent- 
child  relationship.  Cne  record  type,  corresponding  to  the  DBTG 
owner,  is  the  parent  record  type.  The  other,  corresponding  to 
the  EETG  member,  is  the  child  record  type.  That  is,  every  record 
occurrence  cf  the  parent  type  can  have  zero,  one  or  more  related 
record  occurrences  of  the  child  record  type.  In  other  words,  a 
parent  child  relationship  represents  a  function  from  occurrences 
of  the  child  record  type  to  occurrences  of  the  parent  record 
type-  As  an  added  restriction  this  function  is  total.  That  is, 
every  record  occurrence  of  the  child  record  type  cannot  exist 
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without  being  related  to  exactly  one  occurrence  of  the  parent 
record  type. 

Given  a  set  of  record  types  the  hierarchical  data  model 
imposes  severe  restrictions  on  the  parent-child  relationships 
defined  among  them.  First,  there  is  at  mcst  cne  parent-child 
relationship  between  any  two  record  types.  As  a  result,  parent- 
child  relationships  need  not  be  explicitly  named.  Source  and 
destinaticn  identifies  them  uniquely.  Second,  all  the  parent- 
child  relationships  defined  among  the  record  types  form  a  data 
structure  diagram  which  is  a  tree.  In  addition,  the  direction  of 
all  the  arcs  is  towards  the  leaves.  Such  limited  data  structure 
diagrams  are  also  called  ilerarch  ical  definition  1  r^s .  They 
represent  all  the  parent-child  relationships  among  the  record 
types.  As  a  result,  they  define  all  the  possible  connections 
among  record  occurrences  in  a  hierarchical  data  base. 

2.^  I  he  CIAb  II  Kcdel 

The  complete  CIAH-II  model  spans  five  different  levels:  the 
Fnd-user  Level,  the  Information  level,  the  String  Level,  the 
Encoding  level,  and  the  Physical  Device  Level.  However,  we  are 
restricting  our  attention  only  to  the  Information  Level,  which  is 
comparable  to  the  Conceptual  Level  of  ANSI/SFAPC. 

The  latest  published  version  of  DIAM  uses  the  terminology  of 
SPARC.  For  the  real  world  realm  this  consists  of  terms  such  as 
entity,  property,  lact,  entity  set,  and  enterprise.  At  the 
information  level,  names  of  things  are  dealt  with,  hence  the 


terms  used  are  attribute  name,  attribute  value,  identifier  name, 
identifier  value,  and  fact  representation . 

A  fact  representation  is  a  pair  of  attribute  names,  also 
called  an  association  ^air,  linking  two  identifiers.  The 
association  pair  is  symmetric.  That  is,  one  attribute  name  uses 
one  identifier  as  the  subject  with  the  other  identifier  as  the 
value,  while  the  ether  attribute  name  does  just  the  opposite. 

Using  a  symmetric  pair  cf  asscciations  serves  two  purposes. 
First,  it  removes  any  bias  as  to  which  object  is  being  described 
and  which  object  is  doing  the  describing.  Second,  both  of  the 
associations  are  needed  for  retrieval  from  the  data  base. 

Cf  course  it  is  recognized  that  not  all  facts  are  binary  in 
nature,  Fcr  example,  a  f ather/mcther/child  relationship  in  a 
polygamous  society  or  the  status  of  an  employee  working  on  a 
project  (where  employee  -  project  is  a  many-to-many 
relationship).  In  binary  association  models  n-ary  facts  become 
themselves  entities.  In  FOP^'L,  the  language  associated  with 
Cli^M-II,  the  user  does  not  need  to  be  aware  of  these  "fact 
entities”.  This  is  accomplished  by  use  of  the  VIA  declaration 
attribute  which  essentially  declares  certain  paths  in  the  diagram 
to  be  commutative. 


2 . 5  The  I nf clcqical  Model 
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Accordinq  to  this  ttodel  an  information  system  is  designed  to 
receive,  retain  and  produce  information  atout  a  slice  of  reality, 
called  the  object  system. 

The  Infolcgical  Approach  to  data  base  management  has  been 
suggested  by  Langefors  [1974]  and  Sundgren  [1S75].  The  onject 
system  is  ccmposed  of  objects ,  properties  and  relations.  Objects 
have  properties  and  are  related  to  each  other  at  points  of  time 
and  during  time  intervals.  An  elementary  situation,  or  e- 
*  is  an  elementary  property  (relation)  holding  at  an 
identified  object  (n-tuple  of  objects)  at  a  certain  time  in  the 
object  system.  The  two  types  of  e-situations  may  be  represented, 
respectively,  by  triples  <o,p,t>  and  <<o1 , . . . , on>, r, t>  where 
o,o1,...,cr  are  objects,  p  is  a  property,  r  is  a  relation  and  t 
is  a  point  or  period  of  time.  The  set  of  objects  is  not  a 
disjoint  class:  e-situations,  properties  and  relations  may  be 
considered  objects. 

What  humans  use  when  they  perceive  and  think  about  an  object 
system  are  references,  which  are  conceptual,  mental  entities. 
Eeferences  may  be  combined  in  many  ways  into  reference 
expressions.  A  reference  expression  referring  to  an  e-situation 
is  called  an  e-message.  An  e-message  is  the  basic  unit  of 
information  in  an  information  system.  A  property  type  e-message 
may  be  represented  by  a  triple  <p  (object) , p  (proper t y) , p  (time) >, 
where  p  denotes  a  reference.  An  example  of  a  property  type  e- 
message  is  "car  number  27,  mileage= 1 5000  miles,  Oct. 22,  1974",  A 
relational  e-message  is  represented  by  a  triple 
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<<p  (obj.  1 p  (obj ,  n)  >,  p  (obj.  relation)  ,  p  ( time)  >  .  An  example  is 
"the  youngest  cousin  of  George  married  the  fair-haired  sister  of 
Eill  the  same  day  as  nary’s  grandmother  died”.  Property 
references  may  have  the  form  of  a  pair  <p  (attribute)  , p  (value)  >  as 
in  the  first  example  above. 

As  a  further  step  in  comprehending  the  object  system, 
objects  are  organixed  into  classes  and  values  are  organized  into 
attributes.  There  is  a  corresponding  classification  of  e- 
situaticns  into  e-situaticn  types.  At  the  level  of  references, 
e-messages  are  classified  into  e-message  types  called  elementary 
concepts,  or  e-cpncept s.  E-ccncepts  are  represented  by  pairs 
<F(x)  »F(y)>  where  x  is  an  object  (n-tuple  of  objects)  and  y  is  an 
attribute  (object  relation).  There  are  also  versions  of  e- 
concepts  in  which  a  time  reference  is  present. 

In  the  real  wcrld  certain  e-situaticns  are  in  causal  or 
precedence  relations  with  other  e-sitations.  This  is  reflected 
in  the  information  system  by  the  existence  of  precedence 
relations  among  e-messages.  The  elementary  process,  or  e- 
^rccess,  is  defined  tc  be  a  procedure  which  generates  a  certain 
e-messaqe  frcro  its  e-message  precedents. 

The  e-messages  in  an  information  system  reside  in  a  data 
base  (still  an  infological  level  concept).  The  main  subsystems 
of  an  infological  data  base  are  the  nucleus,  the  schema  and  the 
filter.  The  nucleus  is  the  set  of  e-messages  which  at  the 
datalogical  level  are  explicitly  represented  by  data  in  the 
physically  existing  files.  The  schema  embodies  the  e-processes 
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The  ttajor  type  of  transaction  which  leaves  the  data  base 
invariant  is  the  query  which  rormally  results  in  a  nontrivial 
reply.  Among  queries,  three  types  are  distinguished;  yes/no 
queries,  retrieval  queries  and  process  queries.  Yes/no  queries 
contain  a  complete  e-messaqe  as  parameter  and  cause  a  response  of 
either  "yes”,  ”no'',  or  ’’don’t  know”.  In  retrieval  queries,  the 
parameter  is  ar  incomplete  message. 


2,6  Binary  logical  Associations 
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”ccncept”  is  that  concepts  are  the  things  dealt  with  at  the 
Ccnceptual  Schema  level.  One  of  the  positions  of  this  model  is 
that  concepts  do  not  constitute  entities.  Entities  are  asserted 
to  have  their  proper  place  in  the  External  Schema.  "Concept"  as 
used  in  this  model  can  be  thought  to  have  its  dictionary  meaning 
of  "general  notion".  The  terras  "pre-entity"  and  "sub-entity" 
also  convey  the  meaning  of  "concept"  as  used  here. 

In  ^his  model.  Logical  Associations  are  restricted  to  be 
binary.  The  reasons  put  forth  are  that  binary  relations  are  the 
least  arbitrary  data  structure  and  that  they  are  not  biased 
towards  any  particular  application  view.  The  Sets  of  Concepts 
which  are  the  domains  of  the  Binary  Logical  /associations  are  all 
Fundamental  Domains,  ie.,  they  are  not  themselves  Logical 
Associations.  This  means  that  nested  Associations  are  not 
allowed. 

To  represent  n-ary  relations  and  other  non-binary 
structures.  Dummy  Sets  of  Concepts  are  used,  and  they  are  treated 
as  other  Sets  of  Concepts  except  that  the  values  of  a  Dummy  Set 
cannot  be  retrieved  by  the  user.  Essentially,  "Dummy"  means  that 
the  content  of  the  Dummy  Set  of  Concepts  is  expressed  in  terms  of 
other,  more  elementary  Sets  of  Concepts. 

There  is  no  distinction  between  Associations  which  are  N:1 
(functions),  and  cnes  which  are  N:M  (many-to-man y)  In  other 
words,  at  the  Conceptual  Level,  the  functionality  of  the 
Associatior  is  not  considered  relevant. 
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2 *7  The  Ent it y- Relationship  Kcdel 

The  entity-relationship  model  [Chen,  1S76]  attempts  to 
provide  a  unified  view  of  data.  Easically,  the  model  consists  of 
sets  of  entities  or  entity  sets  which  denote  real-world  entities 
or  "things"  that  can  be  distinctly  identified,  relatiocshi  p  is 
an  association  among  entities,  and  a  relationship  set,  F  ,  is  a 
mathematical  relation  among  n  entities,  each  taken  from  an  entity 
set.  Thus  each  tuple  of  a  relationship  set  is  a  relationship. 
The  role  of  an  entity  in  a  relationship  is  the  function  it 
performs  in  that  relationship. 

There  are  two  types  cf  entity  relationships:  those  which  are 
used  to  identify  an  entity  and  those  which  are  not.  The  first 
case  is  called  a  weak  ent it  y  relation,  and  the  second  a  reg  ular 
relation .  Weak  entity  relations  represent  "hierarchical" 
relationships,  e.g.,  between  an  employee  and  his  dependents  such 
that  the  employee-dependent  relationship  "identifies"  the 
dependent . 

Information  regarding  an  entity  or  a  relationship  can  be 
expressed  by  a  set  of  attribute  value  pairs.  Values  are 
classified  into  named  sets  such  as  FEET,  CCLOE,  and  FIRST_n;^KE. 
Further,  each  value  set  has  a  predicate  associated  with  it  to 
test  whether  a  value  belongs  to  it.  An  attribute  is  defined  as  a 
function  from  an  entity  set  or  relationship  set  to  a  value  set, 
or  the  Cartesian  product  of  value  sets. 
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There  is  a  distinction  between  conceptual  inf  crma tion 
JlJJJCture  where  one  refers  to  entities  and  relationships,  and 
ilJSISJJticn  structure  where  the  entities  and  relationships  are 
represented  values.  In  both  cases,  the  infcrmation  structure 
is  given  in  tabular  and  d iagrama tical  fora.  At  the  second  level, 
an  entity  {relationship)  is  uniquely  identified  with  and  by  its 
value.  Several  entity  keys  may  exist  but  one  of  them 
is  chosen  as  the  primary  key.  Moreover,  an  entity  key  may 
consist  of  a  group  of  attributes  such  that  the  mapping  from  the 
entity  set  to  the  corresponding  group  of  value  sets  is  one-to- 
one. 

The  entity-relationship  model  uses  a  diagram  (or  graph)  to 
depict  entity  sets  and  relationship  sets.  An  entity  set  is  drawn 
as  a  rectangle,  and  a  relationship  set  as  a  diamond.  Eoth  set 
types  are  named.  ^n  entity  set  defined  be  a  weak  entity  relation 
is  shewn  as  a  rectangle  within  a  rectangle.  Attributes  and  value 
sets  are  not  included  in  the  diaqrair.  Undirected  lines  connect 
entity  sets  to  the  relationship  sets  in  which  they  participate. 
Moreover,  the  lines  are  labelled  with  ”1",  or  "m"  to  denote  the 
number  of  times  an  entity  may  appear  in  relationship  tuples  of  a 
relationship  set. 

logical  access  to  the  complete  model  may  be  achieved  by 
means  of  a  set-oriented  language  at  both  the  conceptual 
information  structure  level  and  the  relation  level. 

Lastly,  the  entity-relationship  model  may  be  used  to  obtain 
a  relational  model  [Codd,  1970]  representation,  the  CPTG/CODASYL 
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Data  Structure  Diagram  [CCDASYL  DETG,  1971]  and  the  DIAM  II  Model 
[ SenXc,  1975b]. 

2.6  The  Data  Semantics  Model 

The  data  semantics  model  proposed  by  Abrial  [1974]  uses  the 
binary  relation  as  its  fundamental  construct.  The  binary 
relation  is  defined  as  an  association  between  two  categories  of 
objects.  The  relation  is  characterized  by  two  access  f uncti cns^ 
each  of  which  is  defined  by  its  name  and  the  minimum  and  maximum 
number  of  objects  in  the  range  category  that  can  be  reached  from 
an  object  in  the  domain  category.  Thus,  if  A  and  B  are 
categories  of  objects  then  a  binary  relation  rel(A,E)  would  have 
access  functions  ajEi^rA — >P  (E)  and  afn2:E-->P  (A)  ,  where  P(X) 
denotes  the  power  set  of  a  set  X.  Thus  the  access  functions  may 
be  multivalued. 

A  data  base  is  considered  to  be  the  model  of  an  evolving 
physical  world,  and  the  model's  state  represents  the  know  ledge  it 
has  acguired.  This  knowledge  can  be  divided  into  four  classes: 

facts  such  as  "John  Taylor  was  torn  in  New  Ycrk  on 
November  1,1973”,  simple  rules  such  as  "Every  man  necessarily  has 
two  parents  of  whom  he  is  a  child",  complex  rules  such  as  "A 
single  person  who  marries  may  not  be  single  again  in  the  future", 
and  rules  which  d educe  or  compute  facts  from  other  facts  such  as 
"The  profit  made  on  a  product  is  the  difference  between  its 
selling  price  and  its  cost  price". 
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In  this  model,  a  distinction  is  made  between  abstract 
objects  which  represent  things  such  as  the  integers,  complex 
numbers  and  dates,  and  concrete  objects  such  as  cars  and  people. 
An  object  is  described  in  the  model  by  its  connections  to  other 
objects.  Concrete  objects  such  as  cars,  invitations,  and  so  on 
are  given  internal  names  or  representations. 

In  this  model  binary  relations  and  access  function 
primitives  are  considered  adequate  for  defining  semantics.  The 
access  functions  help  describe  the  semantics  by  means  of  the 
minimum  and  maximum  cardinality  restrictions,  Moreover,  other 
semantic  properties  such  as  validity,  consistency  and  redundancy 
may  be  described  procedurally  by  programs. 

The  data  semantics  model  provides  for  logical  access  to  data 
by  means  of  glementar y  operations  on  access  functions  via 
pregrams.  Further,  "semantic  extensions"  and  queries  to  data  are 
dene  by  procedures  which  are  equivalent  to  the  first-order 
Predicate  Calculus. 

Finally,  the  complete  Data  Semantics  model,  together  with 
the  procedures  for  infermation  processing  may  be  described 
formally  in  terms  of  the  model  itself, 

2 , 9  The  Functional  Model 

A  functional  model  of  data  has  been  proposed  by  Nijssen 
[1975]  and  by  Kerschberg  and  Pacheco  [1975],  We  present  the 
terminology  of  tne  latter  authors.  An  important  component  of  the 
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functional  isodel  is  its  underlying  labelled,  directed  multigraph 
with  loops.  Ihe  nodes  of  the  graph  represent  either  entity  sets 
that  nodel  real-world  entities  or  abstract  sets  representing 
associations  among  entity  sets.  The  arcs  of  the  graph  represent 
total  functions. 
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S  to  the  Cartesian  product  of  the  function's  range  sets  is  one- 
to-one.  If  the  key  consists  of  more  than  one  function  it  is 
called  a  cpmposite  kej,  otherwise  it  is  a  simple  Jce^.  Multiple 
as  well  as  overlapping  keys  are  permitted  in  the  model. 

logical  access  to  the  complete  functional  model  is  achieved 
by  means  of  a  high-level  declarative  language.  Query 
specification  is  done  ty  walking  (or  navigating)  through  the 
graph . 

The  functional  model  may  be  used  to  obtain  a  relational 
model  representation,  a  DETG/CODASYL  Data  Structure  Diagram,  and 
the  Dli^M  II  model. 

2.10  Ihe  Semantic  Set  work  Model 
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Ihe  seaantic  network  model  of 
Mylopculos,  1975]  attempts  to  capture  the 
stored  in  the  data  base.  The  semant 
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data  [ Houssop 
semantics  of  t 
ic  network  is  a 
labelled.  Node 
while  edge  label 
and  inferences. 


culos  & 
he  data 
directed 
la  bels 
s  have  a 


There  are  four  node  types  which  are  used  to  represent 
knowledge  in  the  model.  They  are  concepts,  e  ve  n  t  s , 
characteristics,  and  value~nodes.  Ccnce£ts  are  essential 
constants  or  parameters  of  the  real-world  and  specify  physical  or 
abstract  objects.  Events  represent  actions  in  the  real-wcrld  and 
they  are  represented  by  a  case-grammar  model  [Fillmore,  1968] 
which  consists  of  an  event  node  and  several  nodes  that  specify 
the  roles  associated  with  the  event.  The  arcs  connecting  events 
to  role  nodes  are  labelled  by  role-names  such  as:  agent  (a), 
affected  (aff) ,  topic  (t) ,  instrument  (i) ,  result  (r) ,  source 
(s),  destination  (d)  and  object  (c) . 


Characteristics  are  used  to  represent  states  or  to  modify 
concepts,  events,  or  other  characteristics.  A  characteristic  is 
a  binary  relation  mapping  the  elements  from  its  domain  to 
"values”  of  its  range.  Graphically,  the  characteristic  is 
represented  as  a  node  labelled  with  the  characterst ic  name, 
together  with  a  "oh"  labelled  edge  from  the  characteristic  node 
to  the  domain  (element)  and  a  "v"  (value)  labelled  edge  pointing 
to  the  corresponding  range  (element).  The  characteristics  may  be 
of  four  types  depending  on  the  nature  of  the  mapping  they 


22 


represent  i.e.,  many- tc-many ,  many- to-one,  one-to-many,  and  one- 
tc-one. 

Knowledge  of  the  data  is  grouped  together  in  a  scenario 
which  is  a  collection  of  events,  characteristics  and  mathematical 
predicates  related  through  causal  connectives  such  as 
"prerequisite”  and  "effects”.  A  scenario  may  be  regarded  as  a 
template,  so  that  information  submitted  to  the  model  would  have 
tc  match  a  scenario  in  the  semantic  network  to  make  "sense"  to 
the  data  base. 

The  semiantic  network  model  admits  a  relational  model 
representation  which  includes  concept- relations,  part-relations, 
event- relations,  and  characteristic- relations.  A  part-relation 
is  used  to  represent  hierarchies  such  as  a  COMPANY  whose  parts 
are  DEPAPTBENTS. 

logical  access  to  the  data  base  relations  is  dene  fcy  means 
of  a  "semantic"  relational  algebra.  Natural  language  is  intended 
to  be  the  high-level  query  language  to  this  model  [Mylopoulos  et 
ai.,  1975]- 

2.11  The  1 J t en^ d  Set  Theory  Model 

Childs  proposes  a  single  formalism  fer  the  design, 
i nplementation  and  operation  of  information  systems.  The  model 
is  thus  intended  to  have  as  its  range  of  applicability  all  of  the 
levels  of  an  information  system  from  the  conceptual  level  to  the 
machine  representation  level. 
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Pn  axiomatic  development  is  outlined  in  Childs  Here 
we  will  present  the  salient  features  of  the  formalism. 

The  basic  objects  of  the  model  are  complexes  (or  qu asi- 
sets) ,  sets  and  n~ tuples.  The  basic  relation  that  can  hold 
between  two  objects  is  that  of  i-membership.  (Here  both  n  and  i 
range  ever  the  natural  numbers.)  If  x  is  an  i-th  member  of  y,  one 
writes  x  €i  y.  The  complex  is  the  most  general  cf  the  three 
types  of  objects.  For  any  i,  a  complex  may  have  any  number 
(C,1,2,...)  of  i-members.  A  set  only  has  1-members.  For  a  given 
E,  an  n-tuple  has  exactly  one  i-member  for  each  i  from  1  to  n. 
Thus  it  can  be  said  that  if  x  €i  y  then  x  is  at  position  i  cf  y 
(fcssibly  along  with  ether  elements). 

To  enumerate  complexes,  the  positions  of  the  elements  are 
written  as  superscripts.  For  example,  the  complex  a={b‘,ci,d3} 
has  members  b  €1  a,  c  61  a,  d  €3  a  and  no  others.  A  3-tuple 
<a,b,c>  may  be  written  {ai,b2,c3),  and  a  set  {a,b,c}  has  the 
cuasi-set  structure  {ai,bi,ci). 

Set  operations  in  Extended  Set  Theory  are  generalizations  of 
the  usual  set  operations.  For  example,  intersection  is  defined 
as  follows:  C=A  n  E  if  and  only  if  for  all  x  and  i,  x  €i  C  if  and 
only  if  X  €i  A  and  x  €i  B. 

There  is  nc  limit  to  the  number  of  operations  definable  in 
Extended  Set  Theory  since  there  is  available  def ini tion-by- 
abstraction  which  here  takes  the  form; 

P.  =  {xi:)Z(x,i)}  if  and  only  if 
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for  all  X  and  i,  x  €i  A  iff  0(x,i)  and 
?!  is  a  coirplex. 

Cne  of  the  results  of  defining  tuples  independently  from 
sets  is  ^hat  set  operations  are  now  neaningful  when  applied  to 
tuples.  For  example, 

<a,b,c>  u  <a,t,d,e>  =  {a ^ ,b2 ^c^^d^ ,©♦)  and 
<a,b>  n  <c,b,e>  =  {b^}, 

Kcre  complicated  examples  are  possible  using  extended  operations 
cn  relations.  Childs  [1968]  has  given  a  collection  of  extended 
set  operations  sufficient  for  most  information  processing 
applications. 

2.12  Information  Management  Concepts 

Eurchholz  and  Richter  have  proposed  Information  Management 
Concepts  (IMC)  as  a  model  of  the  universe  of  discourse  between 
the  user  and  the  DBMS.  The  basic  unit  of  information  exchanged 
is  called  a  cc^truct.  Recursion  is  used  to  define  constructs. 

In decomposable  elementary  constructs  are  called  values. 
Ccmpound  constructs,  those  formed  by  combining  several  constructs 
into  larger  ones,  are  called  aqqrega tes.  Aggregates  are  said  to 
be  composed  of  constituents.  Names  can  be  assigned  to  the 
constituents  of  a  construct  to  introduce  information  structure  to 
the  model  and  to  provide  a  selection  mechanism.  Aggregates 
having  named  constituents  are  called  ngmina ticns.  Aggregates 
with  unnamed  constituents  are  referred  to  as  collections. 
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Aggregates  having  a  mixture  of  named  and  unnamed  constituents  do 
net  appear  to  be  needed.  Mathematically,  nominations  are 
functions  napping  names  to  constructs,  and  collections  are  finite 
sets.  Collections  of  nominations  appear  often  and  are  given  the 
name  collective. 

While  a  "collection”  views  a  set  as  an  assemtly  of  things,  a 
set  maj  also  be  viewed  as  an  abstraction,  separating  set-specitic 
properties  from  element  distinguishing  properties.  Sets  of 
constructs  expressing  property  separation  and  abstraction  are 
called  t^^es.  The  intensional  distinction  between  a  type  and  a 
construct  is  that  a  type  expresses  a  restriction  of  the  universe 
of  discourse  while  a  construct  is  model  information.  Note  also 
that  constructs  are  always  finite  whereas  types  may  be  infinite. 
Types  are  defined  with  a  type  description  language  (TDI) . 

An  element  of  a  type  is  called  an  instance  of  that  type.  A 
description  cf  a  type  in  a  given  TDL  is  called  a  schema .  There 
are  a  variety  cf  possible  ways  to  describe  a  type.  The  instances 
cf  a  small  type  can  be  enumerated.  Elements  of  complex  types 
cculd  be  reccgni2ed  by  decision  machines.  So-called  "normal" 
types  are  defined  by  properties  of  aggregation  and  name 
assignment.  Such  types  are  described  by  the  way  their  instances 
are  built  up  from  const! tuen ts,  where  constituents  are  types 
defined  in  the  same  way.  The  recursion  stops  at  value  types. 
Each  type  is  given  a  name,  its  type  designation .  A  type 
designation  is  functionally  different  from  a  name  of  a  construct 
constituent.  The  latter  is  operative  at  the  information  level. 
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while  the  for»er  is  used  at  the  me ta- inf omia tion ,  ie.  ,  schema 
level. 


Within  a  data  base  there  will  be  a  number  of  constructs 
represented  which  are  of  interest  to  the  user.  It  is  useful  to 
consider  there  being  an  all-embracing  aggregate  containing  all 
ether  constructs  as  constituents.  This  aggregate  is  called  the 
framework  construct .  The  basic  operations  of  information 
handling  may  then  be  defined  as  pointing  to  components  in  the 
framework  con  struct  and  making  re guests  to  change  the  framework 
construct . 

Names  can  be  used  to  select  one  of  the  constituents  of  a 
nomination.  Eut  the  same  name  may  appear  many  times  within  the 
framework  construct.  P.  construct  within  a  nesting  of  nominations 
may  be  selected  by  a  composition  of  a  name  from  each  level.  Such 
a  composition  is  called  an  i^eilb  i  f  •  Eut  even  identifiers 
cannot  select  constructs  from  collections.  Constituents  of 
collections  are  identified  by  their  properties.  Hence  a  sequence 
or  composition  of  names  and  properties  can  be  used  to  lead  to 
exactly  one  construct  within  ary  sort  of  nesting  of  constructs. 
Such  a  sequence  is  said  to  define  a  s pot.  Spots  can  be  labelled 
for  reference  purposes.  It  should  be  noted  that  spots  are  not 
equivalent  to  the  constructs  they  lead  to.  Since  the  same 
construct  may  be  a  component  of  many  different  constructs,  a  spot 
adds  to  its  ta rqe t  construct  information  about  the  particular 


environment  in  which  it  occurs 
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With  the  above  concepts  the  basic  function 
defined.  These  basic  functions  include  input 
output  of  a  construct,  generation  of  a  spo 
label  to  a  spot,  and  removing  of  a  label  from  a 


s  of  a  DBMS  can  be 
of  a  construct, 
t,  attachment  of  a 
spot. 


2.13  The  Information  Algebra  Model 


The  Informaticn  algebra  model  was  proposed  by  CODASYL  [1962J 
and  has  been  extended  by  Fobayashi  [1975],  In  this  approach  the 
real  world  is  viewed  as  being  composed  of  various  entities  with 
various  properties  attributed  to  them.  Every  property  of  an 
entity  has  a  value .  There  may  also  exist  different  relationships 
among  the  entities  in  the  universe  in  Kobayashi's  extension. 

This  situation  is  represented  by  an  inf c r ma tier  space.  The 
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B  elati 

onsh 

ips  existing 

among 

enti 

t  u 

Fl 

es 

def  i 

ned 

on  the  informa 

tion  s 

pace . 

as 

su 

m  ed  to 

be 

decomposed 

into  e 

lerae  nt 

bi 

na 

ry 

re  la 

t  i  on 

may  itself  be 

rega 

rded 

en 

ti 

ty 

it  h 

as  t 

wo  special  propertie 

s,  its 

Alth 

ough  it 

is 

possible,  rel 

ations 

a  re 

re 

Ft 

es< 

Ented 

by 

points  in 

the  i 

rf orma 

ties  are  represented  by 
All  n-ary  relations  are 
ary  binary  relations.  A 
as  an  entity.  As  an 
origin  and  destination. 


not 

oonsidered  to 

be 

tion 

space.  They 

are 

28 


distirguished  ty  the  fact  that  their  two  special  properties  have 
the  whole  inforiEation  space  as  value  sets,  rather  than  the  values 
cf  one  cccrdinate. 

Two  ether  tasic  concepts  are  those  of  structure  and  area. 
Formally,  an  area  is  any  subset  of  the  information  space.  A 
structure  is  a  set  cf  non-duplicate  binary  relations.  Thus  a 
structure  may  be  regarded  as  a  directed  graph.  In  practice, 
structures  are  defined  or  specified  areas,  and  areas  are 
described  by  what  structures  are  defined  on  them.  Structures  are 
grouped  into  three  (nested)  types:  linear  structures,  trees  and 
networks . 

The  information  processing  model  has  three  language  levels: 
high,  intermediate  and  low.  We  will  only  describe  the 
ir. termediate  language. 

The  intermediate  level  language  consists  of  a  number  of  set- 
oriented  operations.  There  are  the  three  usual  set  operations  of 
union,  intersection  and  difference,  defined  on  areas,  which  are 
tasic  in  the  sense  of  not  using  any  of  the  structure  of  the 
information  space.  The  more  complicated  operations  require  the 
definition  cf  three  types  of  functions.  These  functions  may 
represent  many  things  such  as  names  or  arithmetic  computations. 

The  domain  of  a  function  of  points  (FCP)  is  the  information 
space.  A  line  is  an  ordered  set  of  points  in  the  information 
space.  The  domain  of  a  function  of  lines  (FOL)  is  the  set  of 
lines  in  the  space.  The  domain  of  a  function  of  areas  (FOA)  is 
the  set  of  areas  of  the  space.  The  values  of  these  functions  may 
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be  numbers,  truth  values  and  strings  of  infornaticn  space  points. 
The  ether  operations  now  use  these  functions  as  parameters.  The 
complete  set  of  operations  is  given  by  Kobayashi  [1975], 

2.14  The  Bata  Space  Model 

In  Eachman's  [1973]  Data  Space  Model  a  data  base  is  viewed 
as  a  set  of  points  in  a  three  dimensional  space.  The  three 
coordinates  are  labelled  entity ,  attribute  and  niea^re.  The 
three-dimensionality  of  the  model  permits  the  use  of  the  familiar 
geometric  concepts  of  point,  line  and  plane. 

Each  entity  is  assigned  a  unique  entity  number,  i,e.,  a 
value  on  the  entity  axis,  All  data  concerning  an  entity  is 
represented  by  a  set  of  points  in  the  plane  perpendicular  to  the 
entity  axis  and  intercepting  it  at  the  given  entity  number.  This 
plane  is  called  an  iso-entity  plane. 

The  Bata  Space  Model  treats  the  three  coordinates  on  an 
equal  basis.  Thus  we  may  also  speak  of  iso-attribute  planes  and 
isp-measure  planes.  Note  that  since  the  measure  coordinate  is  at 
parity  with  the  entity  and  attribute  coordinates ,  it  is 
meaningful  to  ask,  for  example,  "What  are  all  the  attributes  with 
measure  ‘black’?" 

There  is  nc  particular  order  to  any  of  the  three  dimensions. 
Fce  example,  the  values  along  the  measure  axis  might  be  14,  red, 
74,  belligerent.  The  values  along  the  axes  represent  specific 
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measures,  attritutes  and  entity  numbers  that  exist  in  the  system, 
net  all  possible  values, 

troBi  the  examples  Eachman  gives  we  may  abstract  a  set  of 
geometrically  defined  operations  for  accessing  the  data  base. 
They  are  the  following:  Given  an  iso-plane,  select  a  point  in 
this  plane.  Given  a  projection  line,  select  a  point  cn  this 
line.  Form  the  iso-plane  defined  by  two  intersecting  projection 
lines.  Form  the  projection  line  defined  by  two  intersecting  iso¬ 
planes.  Intersect  an  iso-plane  with  a  projection  line 
perpendicular  to  it.  Form  the  iso-plane  through  a  given  point 
perpendicular  to  some  given  axis.  Form  the  projection  line 
through  a  point  parallel  to  a  given  axis. 

2.15  The  Lindgreen  Model 

In  this  model  a  data  base  is  viewed  as  a  picture  of  some 
external  system.  The  system  can  be  considered  a  set  of 
interrelated  entities,  each  characterized  by  a  set  of  properties. 
Some  properties  denote  attributes  of  entities  and  others 
constitute  relations  between  entities.  Properties  are  said  to 
have  values,  which  are  of  two  kinds:  data  sets  and  numeric 
values,  Cata  sets  are  the  result  of  a  representation  process, 
while  numeric  values  result  frem  physical  mea sure ment s .  The 
values  associated  with  a  given  property  are  of  the  same  kind  and 
form  the  formal  value  set  of  the  property.  The  formal  value  set 
of  an  entity  fflating  property  consists  of  references  to  a 
certain  class  of  entities.  Entities  of  this  class  must  have  an 
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i d € n t i f Y i pq  ££0£ert^.  The  value  set  of  this  property 
ccntairs  the  values  of  the  former  entity  relatinq  property. 

The  basic  element  of  information  is  a  triplet  <€,p,v>  where 
e  is  an  entity,  p  is  a  property  relevant  to  e  and  v  is  a  member 
of  p's  value  set.  All  the  information  represented  in  a  data  base 
ccnsists  of  basic  information  elements,  but  to  understand 
information  retrieval,  information  elements  are  organi2ed  into 


information 

sets . 

An  information  set. 

all  of 

whose 

tri pies 

contain  the 

same 

property  component 

is  called 

an 

isotypic 

information 

set. 

An  information  set. 

all  cf 

whose 

tri pies 

contain  the  same  entity  component 

is  called 

an 

isonymous 

in  formation 

set. 

A  homogeneous  information  set 

(HIS) 

is  one 

which  can 

be 

partitioned  into  a 

collection 

of 

isony  mous 

information 

sets. 

each  with  the  same  set 

cf  properties 

(property 

spectrum) .  Alternatively,  a  HIS  can  be  partitioned  into  a 
collection  of  isotypic  information  sets,  each  with  the  same  set 
of  entities  (entity  spectrum) .  Both  isotypic  and  isonymous 
information  sets  as  well  as  single  information  elements  are 
special  cases  of  HISs.  Homogeneous,  isonymous  and  isotypic 
infor ma tier  sets  correspond,  respectively,  to  relations,  tuples 
and  columns  of  a  relaticn- 

Logical  access  is  represented  by  one  monadic  and  six  binary 
operations  in  HISs.  They  are:  extraction,  conglomeration, 
separation,  derivation,  detection,  exposition,  and  inclusion, 

3-  Tte  Classification  of  the  Ca^  Hodels 
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In  this  section  we  classify  the  data  models  with  respect  to 
the  taxonciy  framework  presented  earlier.  The  classification 
begins  with  structural  comparisons.  Here  graph  theoretic  models 
are  compared  with  set  theoretic  models.  The  models  are  also 
classified  with  respect  to  their  use  of  binary  relations,  n-ary 
relations,  or  a  combination  of  both. 


In  section  3.2  we  discuss  how  the  data 
influences  logical  access.  Section  3.3  deals 
comparisons  and  terminology.  Lastly,  section 
"semantics”  of  data  models. 


model's  structure 
with  conceptual 
3.4  discusses  the 


■3-1  Structural  Comparisons 


In  this  section  we  isolate  and  compare  the  mathematical 
structure  of  the  data  models.  Conceptual  differences  are  for  the 
moment  ignored.  The  first  analysis  sees  the  models  divided  into 
graph  theoretic  and  set  theoretic  classes.  This  division  is 
guite  natural  since  some  models  have  an  obvious  graph  structure 
while  those  that  do  not  tend  to  use  set  operations  in  a  more 
significant  way  than  the  graph  models. 


First,  data  models  with  graph  structure  are  presented, 
we  describe  the  structure  of  the  set  theoretic  data  models, 
third  subsection  presents  a  partition  of  the  data  models  base 
their  use  of  binary  relations,  n-ary  relations,  or  a  mixture 
both.  As  a  result,  the  evolution  of  DIAM-I  to  DIAM-II  is  eas 
understand. 


Next 
The 
d  on 
of 
y  to 
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3.1.1  Gr Theoretic  Models 


The  nodes  of  the  models  are  categorized  as  either  structured 
or  unstructured.  The  structured  nodes  are  types  cf  records  or 
tuples.  To  represent  a  structured  node  in  a  model  with 
unstructured  nodes,  new  nodes  are  introduced  for  each  field  of 
the  record  type  and  are  connected  to  the  original  node.  The 
reverse  process  may  be  more  difficult  since  it  necessitates 
identifying  ncdes  which  are  functioning  as  value  sets  and  ones 
which  are  functioning  as  record  types.  A  process  similar  to  this 
is  described  by  Adita  et  al.  [1976].  Mathematically, 
unstructured  ncdes  are  just  pcint  sets. 


The  edges  represent  the  means  for  moving  from  node  to  node, 
ie  are  viewing  the  nodes  as  sets  and  we  can  view  the  edges  also 
as  mathematical  objects,  namely  binary  relations.  As  such  it  is 
meaningful  to  consider  the  functionality  of  the  edges. 

Four  cf  the  models  have  functional  edges:  the  DETG  model, 
the  Hierarchical  Kcdel,  the  Functicnal  Model  and  the  Entity- 
Eelaticnsbip  Model.  In  the  traditional  way  of  drawing  EETG  and 
Hierarchical  diagrams,  the  edge  points  from  the  range  to  the 
dcitain  of  the  function.  The  functional  Model  has  the  edges 
pointing  from  domain  to  range,  the  usual  custom  in  mathematics. 
The  functicns  cf  the  Hierarchical  and  Functicnal  Models  are  total 
while  these  cf  the  DETG  Model  may  be  partial.  Nijssen  [1975] 
discuss  how  CETG  cwner  coupled  sets  may  be  viewed  as  functions. 
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There  are  two  models  which 
cccuriences:  the  Hierarchical  and 

The  graph  of  the  Hierarchical 
trees,  while  in  the  latter  model 
ccnnect  only  relationship  sets  with 
of  these  sets  are  incident  to  value 


place  restrictions  on  edge 
Entity-Pelationship  Models. 
Model  must  be  a  collection  of 
*s  graph,  participant  edges 
entity  sets.  Attribute  edges 
sets. 


Also,  Steel  [  1975  ]  has  recotn mended  that  the  restriction  that 
DETG  cosets  connect  different  record  types  be  removed,  since  it 
is  mainly  an  implementation  restriction.  The  models  described  by 
directed  graphs  or  sets  are  classified  in  Tables  1  and  2, 
respectively. 


3.1.2  Set  Theoretic  Models 

Although  the  tuple  also  embodies  data  relationships  in  graph 
models  with  structured  nodes,  in  set  theoretic  models  the  tuple 
is  the  principle  vehicle  for  representing  relationships.  He  use 
two  parameters  for  subdividing  the  set  models.  He  group  them 
according  to  tuple  size  and  according  to  set  nesting.  Each  model 
either  has  a  nested  or  a  flat  structure. 


Ey  nested  structure  we  m 
themselves  be  sets.  The  three  mo 
the  Information  Management  Ccnce 
Hierarchical  Relational  Model.  T 
their  structure.  Collections  co 
correspond  to  tuples.  The  elemen 
by  names  while  the  elements  of 


ean  that  elements  of  tuples  can 
dels  with  nested  structure  are 
pts.  Extended  Set  Theory  and  the 
he  first  two  are  very  similar  in 
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ts  of  a  nomination  ace  selected 
a  tuple  are  selected  by  natural 
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EUBbers.  The  lain  difference  between  selection  by  name  and 
selection  by  natural  nuirber  is  that  any  tuples  of  the  same  size 
have  the  same  selectors,  while  twc  nominations  can  have  none, 
some  or  all  of  their  selectors  the  same.  The  complexes  of 
Extended  Set  Theory  dc  not  have  a  counterpart  in  the  Durchholz 
ard  Richter  Model. 

Hhen  we  say  that  a  model  has  a  flat  structure  we  mean  that 
the  elements  of  tuples  are  "simple"  objects.  Unnormalized 
relations  and  the  e- messages  of  the  I nt olcgica 1  Model  are, 
precisely  spcaXing,  not  flat.  However,  it  is  not  proper  to  call 
them  nested  structures.  Rather,  they  are  the  result  of  braclceted 
Cartesian  products,  which  Codd  [1971a]  refers  to  as 
"factorization". 

The  flat  models  are  further  divided  according  to  the  size  of 
the  tuples  they  use.  Se  will  first  discuss  the  models  with 
tuples  of  arbitrary  length. 

The  Relational,  Information  Algebra  and  Infological  models 
all  use  arbitrarily  sized  tuples.  It  is  convenient  to  consider 
the  Relational  Model  as  middle  ground  ard  tc  relate  the  other  two 
models  to  it. 

The  e-concept  of  the  Intolcgical  Model  is  approximately 
ecuivalent  tc  a  relation.  The  e-ccncept  defines  an  e-message 
type  and  all  e-messages  cf  this  type  can  be  considered  tuples  of 
the  corresponding  relation.  However,  the  e-messages  still 
contain  the  relation  or  property  name  as  part  of  their  structure. 
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This  difference  is  not  really  significant,  though,  because 
Infological  e-aessages  and  tuples  of  a  relation  are  not  at  the 
same  level  of  abstraction.  At  the  Datalogical  level,  e-messages 
ate  represented  by  e-records  iihich  do  not  contain  the  relation  or 
property  name  and  therefore  are  equivalent  to  tuples. 

The  time  dimension  of  e-messages  is  significant.  The 
relations  in  a  relational  data  base  are  time-varying  and  they 
represent  the  current  state  of  the  world.  The  e-messages  in  an 
Infological  data  base  are  also  ti me- varying .  However,  an  e- 
message  can  represent  a  past,  present  or  future  fact  and  its 
presence  in  the  data  base  signifies  that  the  fact  is  known. 


In  the 

Information 

Algebra 

Model 

there 

is 

only  one  se 

t  of 

tuples,  the 

information  s 

pace.  Since  there  is 

a 

coordinate 

in 

the  space 

for  every  att 

ribute  of 

every 

entity 

re 

presented , 

null 

values  are  provided  when  the  attribute  is  not  applicable.  The 
information  space  can  be  thought  of  as  a  conglomeration  of  a 
number  of  relations.  Each  tuple  has  been  extended  by  null  values 
to  include  all  of  the  attributes  of  the  other  relations.  The 
areas  of  the  infcrmation  space  corresponding  to  relations  can  be 
identified  by  an  appropriate  function  of  areas. 

The  three  models  we  discuss  next  all  use  triplets  as  the 
basic  information  unit. 

The  Data  Space  Model  and  Lindgreen*s  Model  are  very  similar. 
The  components  of  the  triplets  have  the  names  Oentity,  attribute, 
measur€>  and  <entity,  property,  value>,  respectively.  The 
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g€cmetric  view  of  the  Data  Space  Model  is  contrasted  against  the 
mere  set-theoret ic  view  of  Lindgreen*s  Mcdel.  The  iso-entity 
plane  and  the  iso-attribute  plane  of  the  Data  Space  Model  are 
equivalent,  respectively,  to  the  iscnyiacus  infor«ation  set  and 
the  isotypic  inforiration  set  of  Lindgreen's  Model.  In  the  Data 
Space  Model  the  concept  of  measure  is  at  full  parity  with  the 
concepts  of  entity  and  attribute.  Thus  the  model  also  uses  iso¬ 
measure  planes  to  which  there  is  no  equivalent  in  Lindgreen's 
Mcdel . 

ElAM-II  is  the  successor  of  the  original  DIAM-I  as  reported 
by  Senkc  et.  al.  [1973].  It  is  interesting  to  note  the 
correspondence  in  the  structure  of  the  two  models. 

When  DI7^M-I  was  being  developed,  it  was  considered  whether 
the  binary  association  was  the  most  desirable  form  for  the 
description  of  entities.  /although  it  was  recognized  then  that  a 
fact  is  descriptive  cf  both  entities  (n-aty  facts  will  be 
discussed  below) ,  binary  associations  were  considered  to 
necessitate  the  introduction  cf  too  many  names  into  the  system. 
An  asymmetrical  representation  of  facts  was  chosen,  namely  the 
entity  description.  Very  briefly,  an  entity  is  described  by  an 
entity  description  which  consists  of  a  set  cf  name  triplets 
representing  elementary  properties  or  facts ,  Each  triplet 
certains  a  Name  Set  Name,  a  Bole  Name,  and  an  Entity  Name  taken 
from  the  Name  Set.  Descriptions  of  entities  from  the  same  entity 
set  are  gathered  into  a  description  set  with  a  system-unique 


name. 
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The  binary  form  for  associations  was  choosen  for  DIAM-II  to 
refflcve  twc  representation  dependent  aspects  of  First,  in 
DIAH-I,  a  query  ffust  specify  a  description  set  name  rather  than 
the  name  of  the  entity  set  (identifier  name)  for  which  the  query 
requests  information.  Second,  the  sometimes  arbitrary  decision 
of  which  cf  the  twc  possible  places  to  stcre  a  fact  must  be  made. 

EIAM-II  resolves  these  difficulties  by  removing  descriptions 
and  description  sets.  The  name  triplet  has  become  the 
association  pair.  Role  names  in  EIAM-I  correspond  to  association 
names  in  ElfiM-II.  The  naming  problem  is  minimized  by  realizing 
that  association  names  need  be  unique  only  within  the  set  of 
association  names  emanating  from  a  particular  identifier. 
Associations  in  DIAM-II  always  come  in  pairs,  i.e.,  facts  have  a 
symmetric  representation  in  DIAf1-II  and  so  there  is  no  question 
about  where  a  fact  should  be  stored.  Making  triplets  symmetric 
removes  the  distinction  between  an  entity  and  a  value,  as  both 
are  treated  as  entities. 

The  structure  of  DIAM-I  can  also  be  correlated  with  that  of 
Lindgreen's  Model.  Ey  rewriting  the  CIAM-I 
Name_£et_Naroe/Bole_Name/Entity_Name  triplets  in  the  form 
Rcle_Name/ (Name_Set_Name) /Ent ity_Na me,  we  may  note  the 
correspondence  between  a  EIAM-I  entity  description  and  an 
iscnymcus  information  set  from  which  the  entity  has  been 
factored.  The  Role  Name  corresponds  to  a  property  and  the  Entity 
Name  corresponds  to  a  value.  The  Name  Set  Name  may  also  be 
considered  part  cf  the  value,  but  it  more  closely  matches  the 
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fcroaal  value  set  of  the  property.  The  described  entity  appears 
explicitly  in  a  factored  iscnyoous  inforiaticn  set.  The  entity 
tc  which  a  description  refers  is  specified  by  one  (or  more)  of 
the  triples  acting  as  identifiers.  The  information  as  to  which 
of  the  triples  identify  the  entity  must  be  )cept  in  a  Eescription 
Set  Catalog.  Entity  descriptions  for  entities  from  the  same 
entity  set  are  collected  into  Description  Sets  which  correspond 
tc  homogeneous  infcrmation  sets. 

The  three  models  presented  here  which  use  triple s  as  the 
basic  infcrmation  unit  can  be  ordered  according  to  their 
treatment  of  the  three  coordinates.  The  Data  Space  Podel  is  the 
meet  hcmogenecus.  The  entity,  attribute  and  measure  coordinates 
are  treated  cn  an  equal  basis.  In  Lindgreen’s  Model  entity  and 
property  are  dealt  with  similarly.  For  instance,  a  homogeneous 
irformatict  set  may  be  factored  as  a  collection  of  isenymous 
information  sets  or  as  a  collection  of  isotypic  information  sets. 
In  DII'M-I  entities  have  precedence.  There  is  no  primitive  notion 
corresponding  to  an  isotypic  information  set. 

3.1.3  Mathematical  Structure 

Instead  of  classifying  models  as  to  their  graph  theoretic  or 
set  theoretic  structure,  we  can  study  the  way  they  relate 
"things”  in  the  model.  By  things  we  mean  entities,  data  items, 
values,  etc.  A  data  model  can  relate  things  in  three  possible 
ways:  binary  relations,  n-ary  relations,  or  a  combination  of 
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bcth.  The  grouping  of  models  according  to  this  scheme  is 
presented  in  Table  3. 

In  Einary  models,  things  are  related  in  pairs  only.  The 
triplet  set  models  and  the  graph  models  with  unstructured  nodes 
are  brought  together  in  the  binary  relation  group.  The  close 
relaticnship  of  these  two  classes  is  emphasized  by  the  presence 
of  bcth  EIAH-I  and  DIAM-II.  The  main  distinction  between  the 
Graph  Theoretic  and  the  Set  Theoretic  subclasses  of  the  Einary 
grcup  is  that  the  set  models  maintain  asymmetric  concepts  of 
"thing  being  described"  and  "thing  describing",  while  the  graph 
models  treat  the  two  connected  things  as  being  egually  important. 
These  views  will  be  compared  when  we  make  conceptual  comparisons. 

Another  characteristic  cf  the  Binary  models  is  the  presence 
or  absence  of  functionalities.  The  homogenecus  information  sets 
of  Lindgreen  have  only  one  value  associated  with  each  property  of 
the  entities  in  the  entity  spectrum  of  the  HIS.  Thus  a  property 
can  be  considered  a  function  from  the  entity  spectrum  of  the  HIS 
tc  the  value  set  of  the  property.  In  an  entity  description  of 
DIAM-I  a  Bcle  Name  appears  only  once.  Hence,  a  Role  Name  defines 
a  function  with  a  description  set  as  domain  and  a  Name  Set  as 
range.  The  third  model  with  functionalities  is,  of  course,  the 
Functicnal  Hodel. 

The  ether  four  models  dc  not  use  functional  relations.  The 
rata  Space  Model  allows  multivalued  attributes,  and  BIAM-II,  the 
ElA  Model  and  Abrial's  Model  permit  many-tc-many  associations. 


Ihe  ffodels  catagorized  as  N-ary  are  the  Relaticnal  Model, 
Ccdasyl’s  Infornation  ^^.Igebra  and  the  Infolcgical  Model.  Things 
are  related  in  these  models  by  various  forms  of  the  tuple.  In 
the  Information  Algebra  and  Relational  Models  it  is  data  items 
which  are  related  by  tuples.  The  Infological  Model  relates 
objects  by  tuples.  Property- type  e-messages  in  this  model  have 
similarities  with  binary  relations.  However,  the  attribute-value 
structure  of  properties  is  de-em ph asized  and  property- type  e- 
messages  can  be  considered  unary  relations. 

Eata  models  in  the  third  category  relate  things  in  both  a 
binary  and  an  N-ary  fashion.  As  occurred  with  the  Binary  group, 
subclasses  of  graph  theoretic  and  set  theoretic  models  are 
grcuped  together. 

?t  the  top  of  the  Binary  +N-ary  classification  list  we  find 
the  General  Network  [ Tsichrit zis ,  1975],  Semantic  Network,  DETG, 
and  the  Informaticr  Algebra  models.  The  n-ary  relationships  are 
represented  by  event  nodes,  record  types,  and  areas.  The  binary 
relationships  are  the  roles,  links,  and  structures.  Between  any 
two  record  types  (areas)  any  number  of  links  (structures)  may  be 
defined. 

^t  the  bottom  of  this  list  are  the  IMC  and  Extended  Set 
Theory  models.  The  single  binary  relationship  is  set  membership. 
Set  membership  defines  th  existence  of  the  enclosing  set  or 
construct,  as  the  enclosing  set's  very  identity  depends  on  its 
members.  There  can  be  at  most  one  membership  relation  between 
any  two  sets  or  constructs. 
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In  the  middle  of  the  list  are  the  Hierarchical  and 
Hierarchical  Helational  models.  These  models  have  certain 
qualities  mentioned  for  the  other  subcategories.  Namely,  they 
are  similar  to  Extended  Set  Theory  and  Information  Management 
Ccncepts  (IMC)  in  that  there  can  be  at  most  one  binary 
relationship  betMeen  two  segment  types  cr  relations.  On  the 
other  hand,  the  existence  of  the  superior  segment  (relation)  is 
net  predicated  upon  the  existence  of  its  subordinate  segments 
(relations) . 

It  may  be  argued  that  sets  also  represent  relationships  in 
models  of  the  other  categories.  However,  sets  in  these  models 
serve  more  as  cataloging  devices,  factoring  out  information 
cemmen  to  the  elements  of  the  set.  For  example,  all  the  sets 
constituting  relations  in  a  relational  data  base  can  be  mapped 
onto  a  single  infer mati on  space  of  Information  Algebra.  The 
identity  of  the  relations  is  preserved  by  a  function  of  areas. 

3 -2  bcgicj^  Access  in  the  Models 

We  have  described  and  compared  the  static  structure  of  the 


mcdels  under 

consideration. 

To 

better  understand 

the 

significance  of 

the  structures 

we 

should  lock  at  how  the 

data 

base  is  accessed. 

Our  discussion 

of 

logical  access  dees 

not 

compare  particular  data  manipulation  languages.  For  example,  we 
will  discuss  relational  calculus,  but  not  DSL  ALPHA. 

At  the  coarsest  level  we  distinguish  between  e leme nt-a t-a- 
time  access  and  set-a t-a- t ime  access.  Under  set-at-a-tim e  access 
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we  differentiate  between  algebra  oriented,  mapping  oriented  and 
calculus  oriented  access.  The  models  are  categorized  according 
to  this  scheme  in  table  6.  He  did  not  include  models  which  did 
not  have  a  definite  access  associated  with  them.  Conversely, 
some  models  appear  in  more  than  one  column  if  they  support  more 
than  ore  Xir.d  of  access. 

It  should  be  noted  that  other  classification  schemes  are 
possible.  For  example,  Codd  and  Eate  [1974]  give  a  table  of 
language  types  with  nine  categories,  while  Stcnebraker  and  Held 
[1975]  distinguish  between  two  types  of  access. 

Elemen t-a t-a-time  access  is  inherently  procedural  and 
therefore  requires  a  host  programming  language  in  which  to 
operate.  For  example,  the  DETG  model  uses  COECL  and  the 
hierarchical  model  of  IMS  uses  PL/I,  COBOL  or  i^ssemblet.  The 
basic  operation  in  this  type  of  access  is  the  FIND  operation 
which  returns  to  seme  workspace  or  variable  exactly  one  instance 
of  a  record  type.  (The  operation  may  also  fail  and  return  only 
status  information).  Control  statements  such  as  GOTC’s  and  IF- 
statements  must  be  interspersed  with  FIND  statements  in  order  to 
extract  the  desired  in forma ticn  from  the  data  base.  In  a  ddit ion, 
explicit  cr  implicit  currency  indicators  cr  cursors  must  exist  to 
keep  track  of  the  state  of  the  processing. 

/'Igehra  oriented  access  is  by  its  nature  limited  to  the  set 
mcdels.  The  reason  there  is  ro  notion  of  an  algebra  for  a  gra ph 
ncdel  has  been  expressed  by  Codd  [1974]:  hew  do  you  define,  for 
example,  a  union  operation  for  owner-coupled  set  occurrences  that 
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retains  ownership  inf crina tion  for  each  neaber  record  and  yields  a 
result  which  is  an  owner-coupled  set  occurrence? 

In  the  relational  model,  the  algebra  consists  of  operations 
on  relations,  yielding  new  relations.  In  the  information  algebra 
model  of  Codasyl,  the  algebra  consists  of  operations  on  areas 
resulting  in  new  areas.  Eoth  algebras  have  the  three  basic  set 
operations  of  union,  insertion  and  difference.  For  making  inter- 
relaticn  cr  inter-area  correlations  the  join  and  bundling 
operations  are  used.  Eoth  models  have  additional  operations  such 
as  projection  and  glumping  which  help  produce  the  desired  output. 

Mapping  oriented  access  is  often  called  navigation. 
("Navigation"  also  includes  some  kinds  of  eleroen t-a t-a- t ime 
access) .  Mapping  access  has  two  main  characteristics:  There  is 
direction  cr  movement  associated  with  the  access,  and  generally 
there  are  no  new  objects  created  by  the  access,  i.e.,  in  graph 
models,  no  new  nodes  are  created. 

Mapping  access  is  mostly  associated  with  graph  models. 
However,  mapping-type  access  is  definable  for  set  models.  In  the 
relational  model,  a  mapping  is  defined  by  a  composition  of 
relations.  One  example  of  mapping  access  in  relations  is  the 
SCUABE  language  [Eoyce  et  al.  1975],  SQUARE  mappings,  however, 
ate  functions  on  int rarelational  domains.  It  is  the  composition 
of  SQUARE  mappings,  which  allows  travelling  from,  relation  to 
relation,  which  corresponds  to  our  notion  of  mapping. 
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In  th€  graph  models  the  mappings  are  the  objects  we  have 
identified  as  edges,  e. g. ,  functions  and  links.  In  these  models 
the  basic  operation  is  the  application  of  an  edge,  that  is,  a 
mapping,  to  a  subset  of  one  node  yielding  as  a  result  a  subset  of 
ancther  node,  the  ether  node  connected  by  the  edge.  To  start  the 
traversal  cne  cr  more  initial  subsets  must  be  generated.  Two  or 
more  subsets  of  a  node  already  generated  initially  cr  by  a 
mapping  may  be  combined  via  the  set  operations  of  union, 
intersection  and  di f fere nee  to  give  ancther  subset  to  which  more 
mappings  may  be  applied.  It  is  also  essential  that  subsets  of 
the  range  of  a  mapping  can  be  selected  according  to  the 
interrelated  properties  of  an  object  in  the  range  and  the  domain 
objects  mapped  to  it.  This  is  accomplished,  for  example  by  using 
block  labels  in  SEQUEL  [Chamberlin  and  Eoyce  1974]  and  by  keep 
clauses  in  the  general  network  model  [Tsichritzis  1975]. 

In  general,  access  via  mappings  dees  not  form  a  path  in  a 
graph;  it  can  be  any  directed  subgraph  where  the  direction  of  the 
path  need  net  agree  with  the  direction  of  the  edges. 

Although  Boolean  selection  expressions  are  used  with  the 
ether  access  types,  with  the  calculus  oriented  access,  predicates 
are  the  only  means  for  specifying  access  to  the  data  base. 
Conceptually,  calculus  access  is  very  simple.  Cne  specifies  what 
things  are  wanted  and  gives  a  predicate  expressing  the  properties 
these  things  should  have.  Calculus  access  is  so  basic,  in  fact, 
that  it  can  be  used  in  any  model.  A  particular  model,  however, 
may  have  a  structure  which  makes  another  kind  of  access  more 
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natural.  for  example^  in  the  qeneral  network  ffodel  it  is  more 
natural  to  write  SELECT  B  ICH_EMFLO¥EES  LINK  WITH  OBN  TC  HOUSE 
KEEP  PRICE  than  to  write  {HOUSE. PRICE:  (SOME  X)  (RICH_EMPLOYEE  (X) 
and  OBN(X, HOUSE))}. 

Cifferert  models  require  different  kinds  cf  predicate 
calculus  to  express  queries.  In  the  relational  model  the 
calculus  contains  a  predicate  cor respondi nq  to  each  relation  in 
the  data  base.  The  variables  of  the  calculus  represent  tuples. 
Ccmponents  of  tuples  are  represenyed  by  tuple  variables  followed 
by  constants.  Forxulas  are  built  up  from  elementary  relations 
aircnq  tuple  components. 

Hierarchical  relations  require  a  type  theory  since  a 
component  cf  a  tuple  may  itself  be  a  relation.  Set  theory  needs 
twc  binary  predicates,  one  fcr  equality  and  one  for  set 
roembershi p. 

3.3  Ccnce^tual  Com  parisons 

In  section  3.1  we  made  comparisons  based  on  the  mathematical 
structure  cf  the  models.  In  that  section  conceptual  differences 
in  the  thinqs  we  compared  were  ignored .  Fcr  example ,  the  e- 
messages  of  the  Infological  Model  were  compared  with  the  tuples 
of  the  Belaticnal  Model.  In  this  section  we  take  note  of  the 
concepts  which  overlay  the  mathematical  structures  of  the  models. 
Since  a  model’s  conceptual  organization  effects  its  mathematical 
structure,  there  will  be  some  overlap  with  the  discussion  of 
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secticn  3,1.3.  However,  here  the  euphasis  will  be  on  ccncepts, 
their  Beaning  and  their  implicaticns. 

For  most  of  the  models  the  real  world  is  given  structure 
with  the  ccncepts  of  entity,  property,  value  and  relationship. 
The  variations  of  these  concepts  among  the  models  are  two-fold. 
First  there  are  differences  in  terminology.  Comparable  terms  are 
listed  in  Table  4.  The  second,  more  substantial  differences 
involve  concept  usage.  Seme  models  do  not  employ  all  ccncepts 
and  the  ores  that  are  used  are  not  used  equivalently. 

The  first  question  we  should  ask  is,  what  is  an  entity? 
Eicticnary  definitions  are  generally  used  as  a  first 
approximation.  i^n  entity  is  something  that  has  reality  and 
distinctness  of  being  in  fact  or  in  thought.  Cf  course,  the  same 
can  be  said  for  properties,  values  and  rela tion ships.  What 
distirguisbes  entities  from  the  others  is  net  something  intrinsic 
tut  something  subjective:  interest.  An  entity  has  interest  for 
the  individual  or  enterprise.  Bela tion ships,  properties  and 
values  are  alsc  cf  interest,  but  only  to  the  degree  they  provide 
information  about  entities. 

Seme  models  use  entities  and  relationships  only.  CIAM  and 
At  rial's  model  are  among  these.  The  requirement  that  entities  be 
of  interest  has  taken  less  importance  in  these  models  since  there 
is  no  separate  class  of  values.  This  is  justified  in  DIAK-I  by 
noting  that  something  like  "red"  might  be  of  interest,  even  if  it 
isn't  at  the  moment.  Atrial  does  distinguish  between  concrete 
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and  abstract  objects.  Abstract  objects  correspond  roughly  to 
values,  but  concrete  objects  are  definitely  entities. 

In  some  sense  the  Pelaticnal  Model  also  is  in  this  category. 
Specifically,  the  Relational  Model  can  be  said  to  talk  about  only 
values  and  relationships.  This  is  so  because  relations  represent 
the  intension  as  well  as  the  gjcte  rsion  of  the  data,  ie.,  the 
meaning  and  representation  of  the  data. 

The  next  group  of  models  has  entities,  properties  and 
values.  It  includes  the  Data  Space  Model,  Codasyl*s  Information 
Algebra  and  Lindgreen's  Model.  Since  entities  can  be  related 
only  through  their  properties,  the  concept  of  identifier  is 
necessary  at  the  conceptual  level.  An  identifier  is  any  property 
which  puts  the  value  set  in  one-to-one  correspondence  with  the 
entities  having  the  property. 

The  Infological  Model  and  Kobayashi's  Information  Algebra 
are  examples  of  models  in  which  entities  have  properties  and  may 
also  participate  in  relations. 

Among  those  models  using  entities,  properties,  values  and 
relationships,  Chen’s  Entity-Relationship  Model  is  the  most 
general,  here  both  entities  and  relationships  have  properties. 

The  concept  of  entity  set  or  entity  type  is  important  in 
many  models.  This  concept  is  related  to  that  of  property,  and 
the  relationship  takes  one  of  two  forms.  Either  entity  sets  are 
first  distinguished  and  then  a  set  of  applicable  properties  is 
defined,  or  else  the  totality  of  properties  is  defined  and  the 
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entity  sets  are  defined  as  cobboh  doaains  of  certain  groups  of 
properties.  Ey  defining  properties  first,  the  noticn  of  a 
collection  of  entity  sets  having  coaaon  properties  is  quite 
natural.  On  the  other  hand,  by  defining  the  entity  sets  first, 
the  sets  are  net  dependent  on  their  properties  and  consequently 


properties  may  easily  be  added  or 

If  a  aodel  has  the  notion  o 
type  of  an  entity  aust  be  co 
overlapping  entity  types  can 
require  that  entity  sets  be  disjo 
this  allows  unambiguous  naming  of 
cannot  be  masked  by  an  entity  set 
1S75],  however,  that  disjoint 
types  to  be  declared  for  effectiv 
one  entity  type  is  often  a  sub 
example,  a  manager  is  also  an 
ordinary  employees  do  not.  The 
overlapping  entity  sets,  neither 
instance,  in  a  lotion  picture  co 
set  of  directors  overlap.  Hall  e 
collection  of  entity  sets  shou 
operations  of  union,  intersectio 
notes  that  entity  sets  need 
models,  however,  effectively  d 
referred  to  by  Moulin.  N 
representation  level  for  indicati 
or  more  entity  sets.  An  entity 


removed. 

f  entity  type,  the  notion  a  sub- 
nsidered.  The  possibility  of 
also  arise,  Moulin  €t,al.£1976] 
int.  Their  reasons  are  that 
entities  and  that  relationships 
,  It  has  been  noted  [Brown 
entity  types  can  cause  too  many 
€  control.  DIAM-I  notes  that 
set  of  another  entity  type.  For 
employee  and  has  properties 
more  general  case  is  that  of  two 
contained  in  the  other.  For 
mpany,  the  set  of  actors  and  the 
t  al.  [1976]  have  noted  that  the 
Id  be  closed  under  the  usual  set 
n  and  difference.  Chen  also 
not  be  disjoint.  None  of  these 
eals  with  the  taming  problem 
0  provision  is  made  at  the 
ng  that  an  entity  belongs  to  two 
is  generally  represented  by,  or 


51 


at  least  identified  by,  exactly  ore  value  from  one  of  a  disjoint 
set  of  domains. 

The  basic  definition  of  entities  as  the  main  objects  of 
interest  with  properties,  values  and  relationships  subordinate  to 
them  is  modified  when  relationships  are  allowed  to  have 
properties  as  in  the  models  of  Chen,  Hall  et  al.,  and  Eenci  et 
al.  [1976].  Without  relationship  properties,  when  a  relationship 
becomes  interesting  in  itself  it  becomes  an  entity  and  the  roles 
played  by  each  entity  in  the  relationship  now  themselves  become 
relationships.  Such  changes  in  the  model  may  be  avoided  by 
letting  relationships  become  interesting  with  their  own 
properties  without  requiring  that  they  become  entities. 

There  are  now  just  two  differences  between  entities  and 
relationships.  Relationships  can  only  exist  when  the  related 
entities  also  exist  and  relationships  cannot  themselves  partake 
in  relationships.  The  first  difference  is  inherent  in  the 
meaning  of  relationship.  The  second  difference  is  actually  a 
restriction  of  the  particular  models.  In  fact,  the  Tnfological 
model  and  any  model  admitting  unnormalized  relations  will  allow 
relationships  among  relationships.  As  expected,  having 
relationships  with  or  without  properties,  among  or  not  among 
other  relationships  dees  not  effect  the  expressive  power  of  the 
model  tut  it  does  effect  the  presence  of  so-called  excess 
entities  [Hall  et  al.  1976J. 

Some  models  allow  qualifications  in  the  dependence  of 
relationships  on  the  existence  of  the  participating  entities. 
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ScEcetiies  a  rcle  in  a  relationship  nay  be  allowed  to  be  vacant. 
For  example,  relational  attributes  not  participating  in  the  prime 
key  lay  have  null  values.  In  the  ether  direction,  a  role  may  be 
allowed  to  have  more  than  one  participating  entity.  For 
instance,  »brial  uses  the  example:  "Peter  was  invited  Paul  and 
Jane  tc  Paris  on  July  the  15th  of  1973." 

Excess  entities  are  entities  in  which  there  is  no  real 
interest  tut  which  must  be  defined  because  cf  some  restriction  in 
the  model.  P.  related  notion  is  that  of  an  excess  relationship, 
that  is,  a  relationship  connecting  two  entities,  one  of  which  was 
formerly  a  relationship.  We  stress  that  the  concepts  of  excess 
entity  and  excess  relationship  are  subjective  ideas.  Moreover, 
they  dc  have  important  semantic  interpretations,  as  will  be  seen 
shortly. 

The  only  entity  model  which  does  not  need  excess  entities  is 
the  Infolcgical  Model.  All  e-messages  are  important  to  the 
Infolcgical  model,  sc  that  excess  entities  dc  net  arise, 

Erewn  [1975]  has  described  a  real  world  model  intended  to  be 
mapped  to  the  CETG  model.  In  models  such  as  this,  excess 
entities  cccur  when  nonfunctional  relationships  must  be  expressed 
as  entities  with  domains  connected  to  the  relationship  entities 
by  what  are  essentially  projections.  The  Entity-P elation  ship  and 
Functicnal  Models  also  use  this  construct. 

Another  kind  cf  excess  entity  occurs  when  the  model  only  has 
binary  relaticnships.  In  this  case  an  n-ary  relationship  must 
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appear  as  an  entity.  The  rcle  of  each  entity  in  this 
relationship  is  new  itself  a  relaticnship. 

Kith  these  two  kinds  of  excess  entity  the  term  "excess"  is 
in  soue  sense  objective  because  it  may  very  well  be  that  no  one 
is  interested  in  the  entities  created.  It  may  alsc  happen  that 
one  application  becomes  interested  in  a  relaticnship  and  gives  it 
a  property  or  involves  it  in  a  relaticnship.  If  the  model 
requires  that  the  relationship  become  an  entity  there  may  be 
ether  applications  which  consider  this  an  excess  entity,  Senko 
[1975t]  has  defined  a  VIA  option  in  the  FCBAL  language  which 
allows  excess  entities  to  be  skipped  ever.  Thus  applications  not 
interested  in  an  entity  need  not  see  it. 

It  is  often,  said  that,  because  a  binary  relationship  is 
miany-tc-man  y,  it  can  become  an  entity  in  its  own  right  with  its 
own  properties.  This  is  misleading.  Ey  the  process  of 
abstraction,  any  (binary)  relaticnship  can  become  an  entity. 
However,  if  a  functional  binary  relaticnship  has  properties  of 
interest,  tut  one  does  not  desire  that  it  become  an  entity,  its 
properties  may  be  attributed  to  the  entities  in  its  domain 
without  losing  any  information.  Two  problems  can  arise  when  one 
attributes  properties  of  a  function  to  its  domain.  First,  if  the 
function  is  not  total,  the  property  is  forced  to  assume  null 
values.  Second,  if  the  function  is  one-to-one,  the  arbitrary 
decision  must  be  made  whether  the  property  will  be  attributed  to 
the  domain,  the  range  or  both.  This,  of  course,  is  not  possible 
with  many- to-man y  relationships. 
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We  have  noted  that  relationships  can  have  qualities  very 
much  like  entities,  and  that  they  differ  mainly  in  their 
dependence  on  the  existence  of  the  related  entities.  It  is  also 
pcssitle  fcr  entities  to  have  such  existence  dependence.  This 
dependence  may  te  genuine  as  in  the  case  of  an  n-ary  relationship 
entity  in  a  binary  model  or  the  dependence  may  really  mean  that 
the  interest  of  one  entity  depends  on  the  interest  in  ethers. 
Chen  defines  such  dependent  entities.  For  example,  an  employee's 
children  are  only  of  interest  while  the  employee  works  for  the 
company.  This  qualified  interest  is  similar  to  the  interest  in 
values  describinq  entities.  Thus  dependent  entities  resemble  in 
seme  respects  values.  Deheneffe  et  al.  [1974]  also  define 
existence  properties  of  entities, 

Schmid  and  Swenson  [  1975  ]  have  defined  a  real  world  view  for 
representation  by  relations  which  is  based  on  existence 
dependence  of  objects.  In  this  view  the  world  consists  of  1) 
independent  objects  have  an  existence  of  their  own  and  which  may 
enter  into  associations  with  one  another  and,  2)  characteristic 
objects  which  exist  only  while  they  are  the  describing  parts  of 
characteristic  relationships.  Characteristic  objects  can  have 
characteristics  but  cannot  be  associated  with  other  objects. 
There  are  two  ways  to  describe  this  model  in  the  entity-property- 
relaticnship  terminology.  We  can  say  that  the  independent 
objects  are  the  entities  and  the  characteristic  objects  are  the 
properties.  Then  we  have  a  model  in  which  properties  themselves 
have  properties.  Alternatively,  both  independent  and 
characteristic  objects  can  be  two  classes  of  entities; 
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teristic  relationships  are  two  types  of 
re  no  properties.  Then  we  have  one 
stence  dependencies.  We  chose  the  first 
is  model  in  Table  4.  Note  that  the 
1  also  allows  characteristics  of 


The  cuesticD  of  when  entities  are  recognized  to  exist  is 
related  to  the  ability  to  identify  or  name  them.  We  distinguish 
three  ways  to  relate  existence  and  identifiers.  The  most 
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identifying  properties,  we  are  regarding  an  entity  as  the  bundle 
of  property  values  associated  with  it  [Langefors,  1966]. 
Irforiraticn  J^lgetra  is  an  example  of  this  approach. 

P  less  restrictive  way  is  to  allow  an  entity  to  be 
identified  by  its  properties  and  relationships  with  other 
entities.  As  we  have  already  noted,  Chen's  dependent  entities 
are  of  this  type.  The  identity,  and  therefore  the  existence,  of 
this  type  of  entity  is  then  dependent  on  other  entities. 

?  number  of  models  recognize  the  existence  of  entities  which 
are  not  determined  uniquely  by  their  properties  or  relationships. 
This  does  not  cause  any  difficulties  at  the  conceptual  level. 
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refEesenting  pcssitly  nondistinguishable  entities.  To  represent 
such  entities.  Hall  et  al,  use  the  notion  of  surrogate,  /“brial's 
fflcdel  has  internal  names.  Eenci  et  al.  use  the  concept  of 
entity,  but  with  a  connotation  different  frcra  the  one  we  have 
used  thusfar.  ' These  internal  identifiers  are  values,  each  of 
which  represents  one  entity,  but  which  have  no  meaningful 
external  form.  One  unan  swered  question  concerning 
ncndist inguishafcle  entities  is  hew  their  representations  are 
manipulated  inside  the  data  base.  If  it  is  planned  that  certain 
entities  be  nondistinguisha tie,  the  concept  of  a  group  entity 
might  te  needed.  Eenci's  notion  cf  "measure  of  existence"  is 
semewhat  similar.  If  nond ist i rguish able  entities  appear  at 
random,  for  example,  multiple  voters  registering  with  the  same 
naie  and  address,  there  is  no  way  to  handle  them  separately 
without  adding  an  artificial  distinguishing  property. 

The  discussion  so  far  in  this  section  has  considered  various 
aspects  of  the  entity  concept  as  they  appear  in  different  models. 
The  entity  concept,  however,  is  not  universally  accepted  as  an 
appropriate  paradigm, 

Eracchi  et  al.  [1976]  explicitly  reject  the  idea  of  entity 
at  the  conceptual  schema  level.  Their  model  is  based  on  binary 
relations  defined  over  "Sets  cf  Ccncepts",  The  exact  nature  of  a 
"Ccncept”  is  net  specified  but  one  characteristic  is  that  it  is 
the  smallest  unit  of  information  which  is  amenable  to  processing. 
The  proper  place  for  entities  is  in  application  views.  Ccncepts 
serve  as  basic  building  blocks  for  entities.  The  authors  do 
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allow  concepts  in  the  conceptual  schena  mcdel  to  be  built  up  from 
mere  elementary  concepts.  For  example,  a  DATE  is  built  up  from  a 
DAY,  a  MONTH  and  a  YEAR. 


Curchholz  and  Richter  never  even  use  the  term  ”entity". 
Irforiraticn  is  structured  directly  by  the  concept  of  "construct'*. 
Projecting  structure  onto  the  real  world  is  ret  necessary  in  this 
model  because  the  mcdel  is  intended  to  be  a  conceptual  system  to 
describe  the  universe  of  discourse  between  the  users  and  the 
DEKS.  If  the  user  wants  to  communicate  in  terms  cf  entities, 
this  structure  is  built  on  top  of  the  construct  structure. 


The  Extended  Set 

and 

views  every  level 

of 

conceptual  structure 

the 

Thu 

s  constructs,  sets 

of 

set 

theoretically. 

Theory  Model  of  Childs  goes 
abstraction  in  terms  of 
user  wishes  may  be  built  on 
concepts  or  entities  could 


even  further 
sets.  Any 
top  cf  this, 
be  defined 


T.E. 

Steel  [  1975] 

has 

also 

rejected  the 

notion  of  entit 

y  as 

pa 

rt 

c  f  a 

fcrmalisir  for 

the 

conceptual  level. 

His  view  is 

that 

se 

t 

theory  embedded  in 

a 

modal 

logic  is  the 

proper  choice. 

The 

ar 

gument , 

like  that  of 

Child's, 

is  that  o 

nly  set  theory 

is 
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3 ^  5126  Semantics  cf  Data  Models 

There  are  many  approaches  tc  and  definitions  of  "semantics". 
The  semantic  network  model  which  we  consider  is,  in  fact,  one  of 
these  approaches.  The  method  we  choose  is  based  on  linguistics. 
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While  we  dc  not  claim  that  this  approach  is  the  test  for  all 
purposes,  it  works  well  in  classifying  data  models. 

Cne  linguistic  approach  to  semantics  is  Predication  Analysis 
[Leech,  1974],  Predication  /analysis  deals  with  the  semantic 
structure  of  sentences.  The  analysis  process  involves  the 
identification  of  the  arguments  and  predicates  of  a  predication. 
A  predication  has  one  predicate  and  may  have  zero,  one,  or  two 
arguments.  Moreover,  the  arguments  may  themselves  be 
predications,  thereby  allowing  tree  structures  to  be  built  up. 
Further,  certain  predications,  called  downgraded  predications, 
are  the  semantic  equivalents  of  the  adjective  and  adverbial 
fuEcticns  of  syntax.  Those  predications  representing  the 
adjective  furcticns  of  syntax  qualify  arguments  while  those 
representing  the  adverbial  furcticns  of  syntax  modify  predicates. 
For  example,  the  sentence 

"Suppliers  supply  parts  to  departments  in  specific  vclume", 
would  be  analyzed  as  follows: 

1.  The  main  predication  structure  is  "suppliers  supply 
parts"  with  suppliers  and  parts  serving  as  arguments  and  the  verb 
supply  being  the  predicate  of  the  predication. 

2.  The  prepositional  phrases  "to  departments"  and  "in 
specific  volume"  are  downgraded  predications  performing  the 
adverbial  furcticns  of  syntax.  Moreover,  these  downgraded 
predications  modify  the  predicate  "supply". 

If  we  study  how  the  models  represent  the  main  predication 
and  its  associated  downgraded  predications,  we  should  be  able  to 
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conclude  hew  they  map  the  senantic  predication  structures  into 
information  structures.  It  results  that,  depending  on  the  model, 
these  irfermation  structures  embody  various  semantic  '’levels  of 
atstracticn"  which  range  from  surface  semantics  to  deep 
semantics . 


Thus  we  can  introduc 
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Those  models  which  most  closely  resemble  the 
semantics  cf  a  sentence  (or  a  complete  discourse)  are  the 
Helationship  model  and  the  Semantic  Network  model.  Cur 
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relationship . 
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part  and  departmerits  by  arcs  labp.l  led  agent,  object  and 
destination,  respectively.  The  voluse  supplied  is  a 
characteristic  of  the  event  ncde. 

Eoth  Biodels  consistently  identify  relationship  sets  or  event 
nodes  with  verbs.  At  tiroes  this  seeros  a  bit  artificial, 
especially  when  the  relationship  is  functional,  eg.,  ’’Every 
eitployee  is  lanaged  by  a  unique  employee”.  The  Fntity- 
Helaticnship  model  creates  a  relationship  set  named  MANAGEMENT 
connected  to  the  EEFLCYFE  entity  set  by  two  arcs  labelled  MANAGED 
and  MANAGES.  Chen  argues  that  although  a  relationship  may  be 
functional  at  present,  ie.  N:1,  it  may  become  relational,  ie., 
N:M,  and  should  therefore  be  modelled  as  a  relationship.  A  more 
convincing  arguroent  might  be  that  he  is  trying  to  model  semantic 
surface  structure. 

Another  class  of  models  captures  some  aspects  of  both 
surface  and  deep  semantic  structure.  Ey  deep  structure  we  mean 
that  functional  predications  such  as  MANAGEMENT  are  represented 
by  arcs  (properties,  attributes,  functions)  rather  than,  nodes 
(relations,  relationships,  sets).  Thus  in  graph  models,  arc 
labels  may  be  ver b  phrases  since  the  ncde  representing  the  verb 
may  be  eliminated.  The  models  in  this  class  are  the 
DETG/CCEAS YL,  Eelational,  Infclogical,  and  Eunctional. 

The  relations  of  the  Pelational  and  Infological  models 
represent  semantic  ’’fragments"  corresponding  tc  arguments  and 
predications.  The  attributes  of  predication  relations  represent 
adverbial  irfcrroation  as  well  as  ’’logical  pointers"  to  argument 
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relations .  In  the  CE TG  and  Functional  models,  the  arcs  between 
predication  nodes  and  argument  nodes  provide  the  same  semantic 
structural  information  given  by  the  "logical  pointers"  of  the 
non-graph  models. 

Note  however  that  it  is  possible  to  convert  the  models  of 
this  mixed  class  to  strict  surface  semantics  by  including 
relations  or  nodes  representing  these  functional  predications. 
Mathematically,  this  corresponds  to  viewing  a  binary  relation  as 
a  set  of  ordered  pairs  rather  than  as  a  mapping. 

^t  the  deep  semantic  level  we  have  the  DIAM  II,  Data 
Semantics,  and  Einary  Logical  Association  models.  At  this  level 
the  data  items  or  values  are  viewed  as  entity  sets  or  Sets  of 
Concepts  related  by  means  of  binary  relations.  In  these  models 
predication  nodes  are  unnecessary  as  long  as  the  predications 
have  no  downgraded  predications  (adverbial  information)  modifying 
them.  In  this  case  they  are  binary  relations,  but  when  adverbial 
information  such  as  "to  departments"  and  "vclume"  must  be 
represented,  these  models  create  an  "excess"  node  and  relate 
adverbial  information  to  that  node.  In  Senkc's  FORAL  language,  a 
VIA  option  is  provided  to  skip  over  these  "excess"  entities 
because  his  language  does  not  use  verbs.  However,  from  the 
semantic  point  of  view  these  "excess"  nodes  are  important  since 
they  are  the  natural  place  to  attach  downgraded  predications. 

Models  such  as  Information  Management  Concepts,  Information 
Algebra,  and  Extended  Set  Theory  actually  span  all  sema nt ic 
levels  of  abstraction  due  tc  their  purely  set  theoretic  nature. 
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Thus  these  models  can  be  used  to  construct  the  view  which  best 
suits  a  particular  application.  We  term  these  semantics  "Pre- 
verbal"  since  non-verbal  thought  processes  are  involved  in 
obtaining  these  data  model  information  structures. 

Tc  £ummari2e,  we  note  that  surface  semantic  models  capture 
the  way  we  speak  and  ccmm unica te  verbally  with  one  ancther.  The 
mixed  semantic  level  represents  functional  relationships  as 
properties  of  an  entity.  lastly  at  the  deep  semantic  level, 
entities  relinquish  their  importance  to  data  items  which  are 
treated  as  entities.  Associations  among  entities  (in  graph 
models)  are  labelled  with  long  verb  phrases  which  describe  the 
relationship. 

Although  the  notion  of  a  semantic  spectrum  is  strictly  a 
conceptual  device,  it  does  help  to  explain  variations  among 
models  which  cannot  easily  be  expressed  in  terms  of  structural  or 
conceptual  comparisons. 

^  •  Conclusions 

In  this  paper  we  have  presented  a  taxonomy  framework  for  the 
classification  of  a  collection  of  conceptual  data  models.  The 
framework  parameters  are:  structural  properties,  logical  access, 
conceptual  comparisons  and  terminology,  and  the  semantics  of  data 
ircdel  £ . 

In  studying  the  structural  properties  of  data  models,  one 
immediately  notices  that  some  provide  an  underlying  graph,  while 
others  do  not.  For  models  which  have  a  graph  we  indicate  the 
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meanings  associated  «ith  the  nodes  and  arcs  in  Table  1.  Models 
net  using  a  graph  representaticr  tend  to  be  more  set  theoretic. 
They  differ  in  the  way  the  sets  are  organized  as  to  tuple  size 
and  whether  these  tuples  are  flat  or  nested,  P.  classification  of 
these  irodels  is  given  in  Table  2, 

Another  structural  aspect  of  a  data  model  is  its 
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In  the  section  on  logical  access  we  indicate  the  conceptual 
differences  in  logical  access  for  graph  oriented  models  versus 
set  oriented  models. 

Cne  cf  the  most  difficult  parameters  of  our  taxonomy 
framehork  is  that  of  conceptual  comparisons  and  terminology. 
This  parameter  is  sc  difficult  tc  handle  because  1)  models  use 
different  terminclcgy  to  describe  the  same  concept,  and  2)  the 
same  terminology  is  sometimes  used  to  describe  different 
concepts.  One  of  the  basic  findings  cf  our  examination  of 
terminology  is  that  some  models  introduce  the  notion  of  an  entity 
while  ethers  dc  not.  For  those  that  do  use  this  notion,  we  found 
that  various  forms  of  the  triplet  ”entity-relationship-property" 
were  cemmen  to  them.  Table  4  provides  a  listing  of  those  models 
using  entities  and  the  terminology  they  use  in  describing  the 
"enti ty-relaticnship- property”  triplet. 
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The  last  component  of  our  framework  deals  with  the  semantics 
of  the  data  models.  The  models  can  be  classified  initially  into 
two  classes:  those  which  follow  a  "suhject-predicate-cb ject” 
semantic  approach,  and  those  that  dc  not.  To  those  models 
falling  into  the  first  category,  we  further  refine  the 
classification  based  on  whether  the  models  capture  either  surface 
semantics,  deep  semantics,  or  a  mixture  cf  both.  Further,  some 
models  use  pre-verbal  semantics  in  formulating  information 
structures.  Table  5  places  the  data  models  along  this  '’semantic 
spectrum”. 
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Ps  stated  in  the  introduction,  we  do  not  propose  a  meta-data 
ircdel,  nor  do  we  suggest  that  any  one  model  is  better  than  the 
ethers.  However,  we  have  attempted  to  provide  a  framework  in 
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which  the  reader  can  meaningfully  evaluate  the  data  aodels, 
perhaps  classify  his  own. 


and 
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